TODO 2. standaradize style for all slides titles 3. name all objects with sf in the name (for San Francisco) SF instead (to disambiguate from sf pkg)
Spring 2019
TODO 2. standaradize style for all slides titles 3. name all objects with sf in the name (for San Francisco) SF instead (to disambiguate from sf pkg)
Open https://github.com/dlab-geo/r-geospatial-workshop
Start RStudio and open a new script file
Install required libraries in RStudio - if you do not have them already
install.packages(
c("sf","sp","tmap","classInt","RColorBrewer",
"ggplot2","leaflet", "ggmap"), dependencies=TRUE
)
Intro to working with geospatial data in R
Key goal is familiarity, competency takes lots of practice
Who are you?
Why are you here?
Open one of the tutorial files in a web browser
Slides ./docs/RGeo_pt1.html
Raw Code ./docs/RGeo_pt1.Rmd
Make sure you can cut and paste into RStudio # Geographic Data
are data about locations on or near the surface of the Earth.
![]()
represent location more specifically with coordinates
46.130479, -117.134167

Coordinates only make sense when associated with a CRS!
![]()
Geographic Coordinates: Latitude and Longitude
Define:
the shape of the Earth
the origin (0,0 point)
the relationship between the system and the real world
the units
Because of variations in these, there are many geographic CRSs!
The World Geodetic System of 1984 is the most widely used geographic coordinate reference system.
WGS84 is the default CRS for most GIS software
Almost all longitude and latitude data are assumed to be WGS84 unless otherwise specified
Historical data much trickier
You can
dynamically determine spatial metrics like area, length, distance and direction
spatial relationships like intersects, inside, contains, etc
and
Spatial data is a broader term than geographic data.
Methods for working with spatial data are valid for geospatial data
All spatial data are encoded with some type of coordinate reference system
Geospatial data require the CRS to be a model of locations on the Earth
Vector and Raster Data
Points, lines and Polygons

Regular grid of cells (or pixels)

We cover raster data only in Day 3 of this workshop
software that can import, create, store, edit, visualize and analyze geospatial data
represented as geometric data objects referenced to the surface of the earth via CRSs
with methods to operate on those representations
We call software for working with geospatial data GIS
Geographic Information System
This term is commonly associated with desktop software applications.
Desktop GIS - ArcGIS, QGIS
Spatial Databases - PostgreSQL/PostGIS
Web-based GIS - ArcGIS Online, CARTO
Software geospatial data support - Tableau
Programming languages with geospatial data support
R, Python, JavascriptYou already use R
Reproducibility
Free & Open Source
Strong support for geospatial data and analysis
Cutting edge
Make sure you have set your working directory to the location of the workshop files.
Make sure you have the packages we are going to use installed.
our_packages<- c("sf","sp","tmap","classInt","RColorBrewer",
"ggplot2","leaflet", "ggmap","dplyr")
for (i in our_packages){
if ( i %in% rownames(installed.packages()) == FALSE) {
install.packages(i)
}
}
There are many approaches to and packages for working with geospatial data in R.
One approach is to keep it simple and store geospatial data in a data frame.
This approach is most common when
the data are point data in CSV files and
you want to map rather than spatially transform or analyze the data
sf_properties_25ksample.csv
San Francisco Open Data Portal https://data.sfgov.org
This data set includes the Office of the Assessor-Recorder’s secured property tax roll spanning from 2007 to 2016.
We are using a subset of these data as a proxy for home values.
sfhomes <- read.csv('data/sf_properties_25ksample.csv',
stringsAsFactors = FALSE)
# Take a look at first 5 rows and a few of the columns
sfhomes[1:5,c("YearBuilt","totvalue","AreaSquareFeet","Neighborhood",
"NumBedrooms")]
Make sure your working directory is set to the folder where you downloaded the workshop files!
sfhomes <- read.csv('data/sf_properties_25ksample.csv',
stringsAsFactors = FALSE)
# Take a look at first 5 rows and a few of the columns
sfhomes[1:5,c("YearBuilt","totvalue","AreaSquareFeet","Neighborhood",
"NumBedrooms")]
## YearBuilt totvalue AreaSquareFeet Neighborhood NumBedrooms ## 1 1923 1031391 2250 West of Twin Peaks 0 ## 2 1909 775300 5830 Presidio Heights 3 ## 3 1915 1116772 1350 Inner Richmond 2 ## 4 1947 860130 1162 Sunset/Parkside 0 ## 5 1981 322749 1463 Outer Richmond 3
class(sfhomes) # what is the data object type? dim(sfhomes) # how many rows and columns str(sfhomes) # display the structure of the object head(sfhomes) # take a look at the first 10 records summary(sfhomes) # explore the range of values summary(sfhomes$totvalue) # explore the range of values for one column hist(sfhomes$totvalue) # histogram for the totvalue column
Use the R base plot function to create a simple map
plot(sfhomes$lon, sfhomes$lat) # using base plot function
Use the R base plot function to create a simple map
plot(sfhomes$lon, sfhomes$lat) # using base plot function
ggplot2ggplot2Most widely used plotting library in R
Not specifically for geospatial data
But can be used to make fabulous maps
Great choice if you already know ggplot2
ggplot2Load the library
library(ggplot2)
ggplot2Basic map with ggplot
library(ggplot2) ggplot() + geom_point(data=sfhomes, aes(lon,lat))
ggplot2Basic map with ggplot
ggplot() + geom_point(data=sfhomes, aes(lon,lat), size=1)
ggplot() + geom_point(data=sfhomes, aes(lon,lat), size=1) + coord_map()
Allows you to associate a map projection with geographic coord data.
Map Projection: mathematial transformation from curved to flat surface

A Projected CRS applies a map projection to a Geographic CRS
All introduce distortion,
in shape, area, distance, direction, or combo
the larger the area the greater the distortion
No one map projection best for all purposes
Selection depends on location, extent and purpose

coord_map("mercator")Mercator is the default map projection used by the get_coord() function.
It’s important to know that you can customize the map projection and parameters if you wish.
But doing this is beyond the scope of this workshop.
totvalueData driven symbology
ggplot() + geom_point(data=sfhomes, aes(lon,lat, col=totvalue)) + coord_map()
totvalueWhat’s happening here?
sfhomes_low2high <- sfhomes[order(sfhomes$totvalue, decreasing = FALSE),] ggplot() + geom_point(data=sfhomes_low2high, aes(lon,lat, col=totvalue)) + coord_map()
Try it - Does the output map look different from previous one?
The order of the data in the data frame changes the map display!
Map the sfhomes data in decreasing order by totvalue.
totvaluesfhomes_high2low <- sfhomes[order(sfhomes$totvalue, decreasing = T),] ggplot() + geom_point(data=sfhomes_high2low, aes(lon,lat, col=totvalue)) + coord_map()
ggplot GoodnessWhat does this code do?
sfhomes2010_15 <- subset(sfhomes_low2high, as.numeric(SalesYear) > 2009) ggplot() + geom_point(aes(lon, lat, col=totvalue), data = sfhomes2010_15 ) + facet_wrap(~ SalesYear)
ggplot GoodnessVisual spatial analysis!
ggmap extends ggplotIt allows you to:
Create basemaps on which you can display your data.
Use the Google Geocoding API to get coordinates for place names and addresses (i.e. to ‘geocode’)
and more…
Load the libary
library(ggmap)
ggmap authorizationSome ggmap functionality may require you to register a Google API key
Beginning July 2018 Google requires a credit card on file to access online APIs
The setup of an API key is beyond the scope of this workshop. So, we we’ll only call Google APIs as a demo in this tutorial. This way, at least you can still see what’s available to you.
ggmap installation and usageIf you happen to already have a configured Google Maps API key (with Geocoding, Geolocation, and Static Maps APIs all enabled), then you are welcome to try this on your machine by running:
register_google(<YOUR_API_KEY_HERE>)
(If you have problems with ggmap you may need to reinstall the library from github.
Otherwise, just watch for the next few steps!
#devtools::install_github("dkahle/ggmap")
library(ggmap)
Let’s check out the get_map function. This will allow us to fetch map tiles to display in our maps.
get_map uses a number of online mapping services including Google, Stamen and OpenStreetMap (OSM)
get_map requires as input a location, expressed as point coordinates.
See ?get_map for details
Let’s fetch a simple map on which we can display our points
Here we are centering the map on San Francisco & using the stamen toner-lite basemap.
#Load my API key from file
#(NOTE: THIS WON'T RUN ON OTHERS' MACHINES!)
filename = '/home/drew/Desktop/gmapi.txt'
register_google(gsub('\n', '', readChar(filename, file.info(filename)$size)))
sf_center_point <- c(lon=-122.445144, lat=37.769335 ) sf_map <- get_map(sf_center_point)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=37.769335,-122.445144&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx
ggmap to display the mapggmap(sf_map)
ggmap with point overlaySyntax similar to ggplot
But ggmap takes the name of the map object returned by get_map()
# ggplot() + ggmap(sf_map) + geom_point(data=sfhomes, aes(x=lon, y=lat, col=totvalue))
ggmap with point overlayggmap(sf_map) + geom_point(data=sfhomes, aes(x=lon, y=lat, col=totvalue))
That map is fine but the zoom level isn’t right.
We want to zoom in on the data by setting the map extent.
get_map ExtentWe can set the map extent by centering the map on our data and increasing the zoom level.
get_map prints a lot of messages to the screen!# Get the center point of the data
sf_ctr <- c(lon = mean(sfhomes$lon), lat = mean(sfhomes$lat))
sf_ctr # take a look
# create the map - setting the zoom level to 12
sf_basemap <- get_map(sf_ctr, source="stamen",
maptype="toner-lite", zoom=12)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=37.758403,-122.439444&zoom=12&size=640x640&scale=2&maptype=terrain&key=xxx
get_map ExtentNow, view the data on top of the sf_basemap with ggmap
ggmap(sf_basemap) + geom_point(data=sfhomes, aes(x=lon, y=lat, col=totvalue))
Finally, let’s add another geospatial data layer to our ggmap.
You can use this method to add as many layers as you want to a ggmap or ggplot.
Use the read.csv function to read in a file of Bart Station locations. What is the name of the column with the longitude values? latitude?
bart <- read.csv("./data/bart.csv")
# take a look
head (bart)
## X Y STATION OPERATOR DIST CO ## 1 -122.2833 37.87406 NORTH BERKELEY BART 4 ALA ## 2 -122.2682 37.86969 DOWNTOWN BERKELEY BART 4 ALA ## 3 -122.2701 37.85321 ASHBY BART 4 ALA ## 4 -122.2518 37.84451 ROCKRIDGE BART 4 ALA ## 5 -122.2671 37.82871 MACARTHUR BART 4 ALA ## 6 -122.2684 37.80808 19TH STREET/OAKLAND BART 4 ALA
For the maps from here on out, to deal with a smaller example dataset, we’re going to also subset our data for only those rows that pertain to year 2015.
sfhomes15 <- subset(sfhomes, as.numeric(SalesYear) == 2015)
sfmap_with_bart <- ggmap(sf_basemap) + geom_point(data=sfhomes15, aes(x=lon, y=lat)) + geom_point(data=bart, aes(x=X,y=Y), col="red", size=3)
sfmap_with_bart
## Warning: Removed 34 rows containing missing values (geom_point).
That’s all we’ll do with ggmap maps for now, since you’re probably not able to run the code yourself on your own computer.
But we’ll get a quick glimpse of ggmap’s geocoding functionality toward the end of today’s material.
SF Landmarks
data/landmarks.csv
landmarks <- read.csv("./data/landmarks.csv")
head(landmarks)
## X Y id name ## 1 -13626151 4551548 1 Coit Tower ## 2 -13630804 4544523 2 Twin Peaks ## 3 -13627654 4548289 3 City Hall ## 4 -13635101 4546904 4 Golden Gate Park
Let’s create a map of the SF homes, BART Stations and Landmarks all together.
sfmap_bart_landmarks <- ggplot() +
geom_point(data=sfhomes15, aes(x=lon, y=lat)) +
geom_point(data=bart, aes(x=X,y=Y), col="red", size=3) +
geom_point(data=landmarks, aes(x=X,y=Y), shape=22,
col="black", fill="grey", size=4)
All good - are all layers displayed? If not, why not?
There are limits to what you can do with geospatial data stored in a data frame.
The Landmark data do not have geographic coordinates - longitude and latitude.
You can’t map these with ggmap.
## X Y id name ## 1 -13626151 4551548 1 Coit Tower ## 2 -13630804 4544523 2 Twin Peaks ## 3 -13627654 4548289 3 City Hall ## 4 -13635101 4546904 4 Golden Gate Park
The ESRI Shapefile is the most common file format for geospatial data.
ggplot and ggmap cannot directly read in or plot shapefile data.
ggplot & ggmap can’t answer questions like
What properties are in the Noe Valley neighborhood?
What is the average property value in each SF neighborhood?
What is the area of each SF Neighborhood and the property density?
What properties are within walking distance (.25 miles) of the Mission neighborhood?
You need spatial data objects and spatial methods for that!
sf packagesf vs. spThe sf package is the most commonly used to construct and manipulate spatial data objects in R.
sf supersedes the package sp and its ecosystem of related packages (mainly rgeos and rgdal). As such, it is a one-stop shop for core geospatial data objects and operations that used to be spread across those 3 packages.
(sp is still frequently used, but its spatial objects are a bit less streamlined. It will be necessary for our raster work on Day 3, so we’ll see a bit of it then.)
sf packagesf stands for ‘simple features’, which is a standard (developed by the Open Geospatial Consortium) for storing various geometry types in a hierarchical data model.
A ‘feature’ is just a representation of a thing in the real world (e.g. a building, a city…). In other words, each feature consists of both a geometric representation of an object and some other information about it (building: height, name, etc…, city: population, area, etc…).
sf packageHere are the most common simple features geometries, which are used represent vector data in sf.
(From the Geocomputation in R textbook, Chapter 2, Figure 2.2)
sf packagesf offers numerous specific benefits, including:
data.frames)%>% piping syntaxst_)tmap)sf packageFirst, of course, we’ll need to load the package and dplyr which it depends on:
library(dplyr) library(sf)
sf objects: IOWe can then read in a spatial dataset into an sf object using the st_read function.
Here we’re reading in a shapefile of SF census tracts:
tracts = st_read(dsn = './data', layer = 'sftracts')
## Reading layer `sftracts' from data source `/home/drew/Desktop/stuff/berk/dlab/RGeo_workshop/data' using driver `ESRI Shapefile' ## Simple feature collection with 195 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -13638250 ymin: 4538276 xmax: -13617440 ymax: 4560139 ## epsg (SRID): NA ## proj4string: +proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs
sf objects: structureThen, as always, we can explore the basic aspects of the object returned, using base R functions:
#the object displays a compact summary, when its name is called tracts
sf objects: structure#the object displays a compact summary, when its name is called tracts
sf objects: structureWhat sort of object is this?
#the object is of both the 'sf' and 'data.frame' classes class(tracts)
## [1] "sf" "data.frame"
#it has a number of columns (i.e. attributes, fields), including a geometry column str(tracts)
## Classes 'sf' and 'data.frame': 195 obs. of 9 variables: ## $ STATEFP : Factor w/ 1 level "06": 1 1 1 1 1 1 1 1 1 1 ... ## $ COUNTYFP: Factor w/ 1 level "075": 1 1 1 1 1 1 1 1 1 1 ... ## $ TRACTCE : Factor w/ 195 levels "010100","010200",..: 164 169 178 184 191 70 26 42 129 155 ... ## $ AFFGEOID: Factor w/ 195 levels "1400000US06075010100",..: 164 169 178 184 191 70 26 42 129 155 ... ## $ GEOID : Factor w/ 195 levels "06075010100",..: 164 169 178 184 191 70 26 42 129 155 ... ## $ NAME : Factor w/ 195 levels "101","102","103",..: 164 169 178 184 191 70 26 42 129 155 ... ## $ LSAD : Factor w/ 1 level "CT": 1 1 1 1 1 1 1 1 1 1 ... ## $ ALAND : num 372758 293133 476606 276334 1022568 ... ## $ geometry:sfc_MULTIPOLYGON of length 195; first list element: List of 1 ## ..$ :List of 1 ## .. ..$ : num [1:9, 1:2] -13637736 -13636969 -13636954 -13636940 -13636925 ... ## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA ## ..- attr(*, "names")= chr "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" ...
#we can use basic data.frame functions on it nrow(tracts)
## [1] 195
ncol(tracts)
## [1] 9
colnames(tracts)
## [1] "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" "GEOID" "NAME" ## [7] "LSAD" "ALAND" "geometry"
head(tracts)
## Simple feature collection with 6 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -13637840 ymin: 4538290 xmax: -13624700 ymax: 4549564 ## epsg (SRID): NA ## proj4string: +proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs ## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ## 1 06 075 035202 1400000US06075035202 06075035202 352.02 CT ## 2 06 075 042601 1400000US06075042601 06075042601 426.01 CT ## 3 06 075 047801 1400000US06075047801 06075047801 478.01 CT ## 4 06 075 060502 1400000US06075060502 06075060502 605.02 CT ## 5 06 075 980200 1400000US06075980200 06075980200 9802 CT ## 6 06 075 018000 1400000US06075018000 06075018000 180 CT ## ALAND geometry ## 1 372758 MULTIPOLYGON (((-13637736 4... ## 2 293133 MULTIPOLYGON (((-13634865 4... ## 3 476606 MULTIPOLYGON (((-13636354 4... ## 4 276334 MULTIPOLYGON (((-13628231 4... ## 5 1022568 MULTIPOLYGON (((-13637838 4... ## 6 937021 MULTIPOLYGON (((-13626895 4...
sf objects: basic plottingWe can plot an sf object using its plot method.
In other words, when we just call R’s base plot function on an sf object, R will recognize that it’s an sf object and thus plot it accordingly.
plot(tracts)
sf objects: basic plottingplot(tracts)
sf objects: basic plottingNote that we get an array of plots, one for each variable (or ‘field’, or ‘attribute’) in our dataset (up to the first 9).
So then we should be able to plot a single variable by just subsetting the sf dataframe for that variable, then plotting the subsetted dataframe.
Plot just the ‘NAME’ column’s data.
(Note: This will be an example of what we call a ‘choropleth’ map.)
#read in a shapefile of SF census tracts plot(tracts['NAME'])
Some of you may have gotten this plot instead:
plot(tracts$NAME)
What went wrong?
class(tracts['NAME'])
## [1] "sf" "data.frame"
class(tracts[, 'NAME'])
## [1] "sf" "data.frame"
class(subset(tracts, select='NAME'))
## [1] "sf" "data.frame"
class(tracts$NAME)
## [1] "factor"
When we use bracket syntax or the subset function, sf objects return new, subsetted sf objects.
But when we use the ‘$’ notation, we just get a vector of the column’s values!
As always, we need to be careful and check what kinds of objects we’re working with!
sf objects: geometriesThe nice thing about sf objects is that they are just data.frames!
The geometry data is just stored in its own special column, usually named geom or geometry.
For the most part, we will not want to manually manipulate the data in the geometry column. But when we’re just getting started, it can be helpful to tinker.
sf objects: geometriesAs we saw earlier, the geometry data is stored in its own column. Let’s take a closer look at that column:
tracts$geometry
sf objects: geometriestracts$geometry
## Geometry set for 195 features ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -13638250 ymin: 4538276 xmax: -13617440 ymax: 4560139 ## epsg (SRID): NA ## proj4string: +proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs ## First 5 geometries:
## MULTIPOLYGON (((-13637736 4546153, -13636969 45...
## MULTIPOLYGON (((-13634865 4549210, -13634268 45...
## MULTIPOLYGON (((-13636354 4548053, -13635877 45...
## MULTIPOLYGON (((-13628231 4538555, -13628045 45...
## MULTIPOLYGON (((-13637838 4548684, -13637471 45...
sf objects: geometriesOne thing we can see is that our CRS and some other metadata is stored as part of this column. This includes:
sf objects: geometriesOf course, all the geometries in the column must have the same CRS.
We can check the CRS of an sf object using the st_crs function:
st_crs(tracts)
## Coordinate Reference System: ## No EPSG code ## proj4string: "+proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs"
sf objects: geometriesAnd we can get our bounding box using st_bbox, or subset certain values:
bbox = st_bbox(tracts) bbox
## xmin ymin xmax ymax ## -13638250 4538276 -13617442 4560139
bbox$xmin
## xmin ## -13638250
sf objects: geometriesWe can also see that this column is some sort of special object.
Unlike with the ‘NAME’ column, when we subset the ‘geometry’ column with ‘$’ we don’t just get a vector of values.
Of course, we can’t just get a vector, because the values are complex objects (geometries).
Instead, we get a “list column”.
sf objects: geometriesSo, what sort of object is this?
class(tracts$geometry)
## [1] "sfc_MULTIPOLYGON" "sfc"
An sfc object!
This is short for ‘simple features collection’, which is basically just an R list of geometries (because as we just said, a vector wouldn’t work).
sf objects: geometriesSo then, what sort of object is each individual geometry?
(Note the double-bracket notation, which indexes values out of a list. Single brackets would just return us a new, subsetted list—in this case, a new, subsetted sfc object.)
class(tracts$geometry[[1]])
## [1] "XY" "MULTIPOLYGON" "sfg"
It’s an object of the XY, MULTIPOLYGON, and sfg classes!
sf objects: geometriesHere’s what those classes mean:
sfg is short for ‘simple features geometry’; this is the geometry data itself!
XY just indicates the dimensionality of that geometry (for all our purposes this will be XY, but some less commonly used data models include a Z dimension)
MULTIPOLYGON just indicates the type of that geometry (which, as we saw earlier, could also be POINT, LINESTRING, …)
sf objects: geometriesA simple features geometry is typically either stored as well-known text (WKT) or well-known binary (WKB).
The latter is more easily machine-readable. But the former is human-readable, and is what we see in an sf geometry column.
sf objects: geometriesThe WKT is just a very simple written representation of the ‘connect-the-dots’ vector data model.
Here’s a point: POINT (2 4)
Here’s a multipoint: MULTIPOINT (2 2, 3 3, 3 2)
Here’s a linestring: LINESTRING (0 3, 1 4, 2 3)
And here’s a polygon: POLYGON ((1 0, 3 4, 5 1, 1 0))
(Notice that the polygon’s first and last coordinate-pairs are the same!)
sf objects: geometriesTypically, you won’t need to create your own geometries from scratch because you’ll just be reading them in from geospatial data files.
But sf does provide functions for doing so.
Here are the geometries from our previous slide:
#create a point pnt = st_point(c(2, 4)) #create a multipoint mpnt = st_multipoint(rbind(c(2, 2), c(3, 3), c(3, 2))) #create a line line = st_linestring(rbind(c(0, 3), c(1, 4), c(2, 3))) #create a polygon poly = st_polygon(list(rbind(c(1, 0), c(3, 4), c(5, 1), c(1, 0))))
sf objects: geometriesThen we can plot them.
plot(poly, col = 'yellow') plot(line, col = 'blue', lwd = 3, add = T) plot(mpnt, col = 'red', add = T) plot(pnt, col = 'black', add = T)
##sf objects: geometries
For comparison, we can take a look at one of the sfg objects in our census tracts:
geom = tracts$geometry[[1]] str(geom)
## List of 1 ## $ :List of 1 ## ..$ : num [1:9, 1:2] -13637736 -13636969 -13636954 -13636940 -13636925 ... ## - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
geom[[1]]
## [[1]] ## [,1] [,2] ## [1,] -13637736 4546153 ## [2,] -13636969 4546189 ## [3,] -13636954 4545919 ## [4,] -13636940 4545657 ## [5,] -13636925 4545394 ## [6,] -13637652 4545354 ## [7,] -13637682 4545615 ## [8,] -13637709 4545877 ## [9,] -13637736 4546153
Again, note that the first and last coordinate-pairs are the same, because this is a polygon!
sf objects: geometriesSo this should clarify the geometry column:
It is a simple features collection (sfc), where each item is just a simple features geometry (sfg).
Each sfg holds a WKT representation of the geometry (sfg), and pertains to a row, which typically contains other data describing the row.
sf objects: summaryAnd that brings us full circle!
Now we can see how we could build our own sf object.
We would just combine a plain old dataframe with a geometry column for it.
df = data.frame(list('name' = c('someplace'), 'pop' = c(2)))
new_sf = st_sf(df, geometry = st_sfc(poly))
sf objects: summarydf = data.frame(list('name' = c('someplace'), 'pop' = c(2)))
new_sf = st_sf(df, geometry = st_sfc(poly))
new_sf
## Simple feature collection with 1 feature and 2 fields ## geometry type: POLYGON ## dimension: XY ## bbox: xmin: 1 ymin: 0 xmax: 5 ymax: 4 ## epsg (SRID): NA ## proj4string: NA ## name pop geometry ## 1 someplace 2 POLYGON ((1 0, 3 4, 5 1, 1 0))
sf objects: summaryplot(new_sf)
sf objects: summaryTo recap:
sf objects are data.frame objects with special ‘geom’ or ‘geometry’ columns..sfc) object.sfc is a simple features geometry (sfg) object.(From the sf docs)
sf objects: summaryAnd here’s another depicition (from the sf Github Repo):
sf objects: summaryAnd here’s our census tracts data again. Hopefully now you can see all the pieces.
tracts
## Simple feature collection with 195 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -13638250 ymin: 4538276 xmax: -13617440 ymax: 4560139 ## epsg (SRID): NA ## proj4string: +proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs ## First 10 features: ## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ## 1 06 075 035202 1400000US06075035202 06075035202 352.02 CT ## 2 06 075 042601 1400000US06075042601 06075042601 426.01 CT ## 3 06 075 047801 1400000US06075047801 06075047801 478.01 CT ## 4 06 075 060502 1400000US06075060502 06075060502 605.02 CT ## 5 06 075 980200 1400000US06075980200 06075980200 9802 CT ## 6 06 075 018000 1400000US06075018000 06075018000 180 CT ## 7 06 075 012501 1400000US06075012501 06075012501 125.01 CT ## 8 06 075 015200 1400000US06075015200 06075015200 152 CT ## 9 06 075 026403 1400000US06075026403 06075026403 264.03 CT ## 10 06 075 032901 1400000US06075032901 06075032901 329.01 CT ## ALAND geometry ## 1 372758 MULTIPOLYGON (((-13637736 4... ## 2 293133 MULTIPOLYGON (((-13634865 4... ## 3 476606 MULTIPOLYGON (((-13636354 4... ## 4 276334 MULTIPOLYGON (((-13628231 4... ## 5 1022568 MULTIPOLYGON (((-13637838 4... ## 6 937021 MULTIPOLYGON (((-13626895 4... ## 7 135257 MULTIPOLYGON (((-13627070 4... ## 8 279195 MULTIPOLYGON (((-13629456 4... ## 9 379382 MULTIPOLYGON (((-13626769 4... ## 10 671900 MULTIPOLYGON (((-13636086 4...
sf objects: plottingSo what about our homes data? Let’s plot them with the tracts data!
To do that, we’ll need to use ggplot (because the homes data is not an sf object). The geom_sf function will allow us to add data from an sf object to a ggplot.
ggplot() + geom_sf(data = tracts) + geom_point(data = sfhomes15, aes(lon, lat, col = totvalue))
sf objects: plottingggplot() + geom_sf(data = tracts) + geom_point(data = sfhomes15, aes(lon, lat, col = totvalue))
sf objects: plottingWhat happened?
Our data wound up at totally opposite ends of our map!
Why?
sf objects: plottingWhat’s the CRS of our census tracts?…
st_crs(tracts)
## Coordinate Reference System: ## No EPSG code ## proj4string: "+proj=merc +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +units=m +no_defs"
And what’s the CRS of the SF homes data?…
st_crs(sfhomes15, n = 1)
## Coordinate Reference System: NA
We can’t plot our data if it’s not in the same CRS!
The #1 reason…

Every spatial object needs a reference to its CRS.
If it is lacking this reference, we’ll need to define it.
Note the important difference between defining and transforming projections!
Defining just makes a spatial object aware of the CRS of its geometries. (This does not change the data’s coordinate values.)
Transforming converts the geometries to a new CRS. (This changes the coordinate values.)
So we need our data all in the same projection. In order to do this, we’re going to need to do 3 things:
Make our homes an explicitly geospatial object, using sf.
Define (or assign, or set) that object’s CRS.
Transform (or reproject) one of our two objects to the CRS of the other.
sf objectsFirst, we need to coerce our homes data to an sf object.
We saw a crude example of constructing an sf object, above.
But we can actually do this in a single line of code, using the st_as_sf function!
sf objectsTo do this, we need to provide 3 pieces of information to 3 arguments:
x: The object to be converted to an sf object
coords: The columns containing the coordinates.
crs: The CRS of the coordinates contained in those columns.
In order to this, we need to know the CRS of our homes data.
So, what is it?…
The coordinate reference system (CRS) defines the mathematical model of the planet within which the locations of spatial data objects are expressed.
A CRS can be unprojected (if coordinates are expressed as positions on a spheroidal surface), or projected (if they’re expressed on a plane resulting from projecting that surface).
GIS software will have a database of definitions for thousands of Earth-referenced CRSs and methods for using those definitions to define or transform a CRS.
Our homes data are just decimal degrees on a spheroid. In other words, they are unprojected longitude & latitude (‘longlat’).
The most common datum and ellipsoid for that sort of CRS are ‘WGS84’.
So we can express our CRS in a proj4 string as follows:
CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")
proj4 strings are a standard format.
However, the proj4 string format is long and complicated. And there are lots of CRSs out there!
So the European Petroluem Survey Group devised a simpler system of numerical codes, called EPSG codes.
The longlat, WGS84 EPSG code is:
CRS("+init=epsg:4326")
Some additional info on CRS definitions:
Proj4 is the standard library for defining and transforming map projections and coordinate reference systems.
Here is an example file of proj4 CRS definitions
We want all data in the same CRS
Which one is best?
Geographic CRSs
4326 Geographic, WGS84 (default for lon/lat)
4269 Geographic, NAD83 (USA Fed agencies like Census)
Projected CRSs
5070 USA Contiguous Albers Equal Area Conic
3310 CA ALbers Equal Area
26910 UTM Zone 10, NAD83 (Northern Cal)
3857 Web Mercator (web maps)
See http://spatialreference.org/
Use this site to find EPSG codes and proj4 CRS strings
sf objectsNow that we know our homes have are in unprojected coordinates with a WGS84 datum (i.e. EPSG code: 4326), we can call st_as_sf as follows:
sfhomes15_sf = st_as_sf(sfhomes15, coords = c('lon', 'lat'), crs = 4326)
Once we set the CRS we can check it with st_crs:
st_crs(sfhomes15_sf)
## Coordinate Reference System: ## EPSG: 4326 ## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
Looks good!
That takes care of both steps 1 (create sf object) and 2 (define CRS)!
Now that our homes are an sf object with the correct CRS, we’ll need to reproject either sfhomes15_sf or tracts, so that they are in the same projection.
Which one should we project, and why?…
There’s no universally right answer to that question. Usually the answer will depend on what operations you plan to run downstream.
In our case, we’ll reproject our tracts, so that everything is in an unprojected CRS and plots with units of decimal degrees.
Note: If we were working with sp objects, we would use the rgdal package to carry out CRS transformations.
However, sf provides a one stop shop for all sorts of common geospatial operations, so we can just stick with sf functions.
Using sf, we can reproject, or transform an sf object using the st_transform function.
The only arguments we need are:
x: The object to be transformed
crs: The CRS to transform x to (which can be provided as just an integer of the EPSG code)
Transform the tracts object to the same CRS as the sfhomes15_sf object. Name the new object ‘tracts_lonlat’ (to indicate that its CRS is unprojected longitude and latitude).
We can transform the tracts to the CRS of sfhomes15_sf using:
tracts_lonlat = st_transform(tracts, crs = 4326)
However, note that if we want to be super explicit, to be certain things match up, we can index the EPSG code directly out of sfhomes15_sf’s CRS object, as follows:
tracts_lonlat = st_transform(tracts, crs = st_crs(sfhomes15_sf)$epsg)
Are they both defined? Are they the same?
st_crs(sfhomes15_sf)
## Coordinate Reference System: ## EPSG: 4326 ## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
st_crs(tracts_lonlat)
## Coordinate Reference System: ## EPSG: 4326 ## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
st_crs(sfhomes15_sf) == st_crs(tracts_lonlat)
## [1] TRUE
sf objects: plottingAnd now we can try again to plot our data together.
(Note that we can ues the aes function to do aesthetic mapping in an geom_sf call, just as we could with geom_point and other ggplot functions.)
ggplot() + geom_sf(data = tracts_lonlat) +
geom_sf(data = sfhomes15_sf, aes(col = totvalue))
sf objects: plottingSuccess!
Work through the following steps:
Coerce the landmarks dataframe to an sf object, with EPSG: 3857, Web Mercator as its CRS.
Read in the ‘sfboundary’ and ‘sfhighways’ shapefiles, from the ‘./data’ subdirectory.
Reproject any of those layers to the CRS of sfhomes15_sf, as needed.
Plot tracts, boundary, highways, homes, and landmarks together. Make the border purple, and the highways and the landmarks red. Color the homes by the ‘totvalue’ column.
1: Coerce the landmarks dataframe to an sf object, with EPSG: 3857, Web Mercator as its CRS.
landmarks_sf = st_as_sf(landmarks, coords = c('X', 'Y'), crs = 3857)
2: Read in the ‘sfboundary’ shapefile, from the ‘./data’ subdirectory.
sfboundary = st_read('./data', 'sfboundary')
## Reading layer `sfboundary' from data source `/home/drew/Desktop/stuff/berk/dlab/RGeo_workshop/data' using driver `ESRI Shapefile' ## Simple feature collection with 1 feature and 3 fields ## geometry type: POLYGON ## dimension: XY ## bbox: xmin: -122.5151 ymin: 37.70825 xmax: -122.357 ymax: 37.81176 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs
sfhighways = st_read('./data', 'sfhighways')
## Reading layer `sfhighways' from data source `/home/drew/Desktop/stuff/berk/dlab/RGeo_workshop/data' using driver `ESRI Shapefile' ## Simple feature collection with 246 features and 5 fields ## geometry type: LINESTRING ## dimension: XY ## bbox: xmin: 543789.3 ymin: 4173548 xmax: 551295.9 ymax: 4183929 ## epsg (SRID): 26910 ## proj4string: +proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
3: Reproject any of those layers to the CRS of sfhomes15_sf, as needed.
sfboundary doesn’t need to be transformed
#check the CRS of sfboundary st_crs(sfboundary) == st_crs(sfhomes15_sf)
## [1] TRUE
3: Reproject any of those layers to the CRS of sfhomes15_sf, as needed.
sfhighways needs to be transformed
#check th CRS of sfhighways st_crs(sfhighways) == st_crs(sfhomes15_sf)
## [1] FALSE
#it needs to be transformed sfhighways_lonlat = st_transform(sfhighways, st_crs(sfhomes15_sf))
3: Reproject any of those layers to the CRS of sfhomes15_sf, as needed.
We know landmarks does, because we just assinged it EPSG 3857.
landmarks_lonlat = st_transform(landmarks_sf, st_crs(sfhomes15_sf))
4: Plot tracts, boundary, highways, homes, and landmarks together. Make the border purple, and the highways and the landmarks red. Color the homes by the ‘totvalue’ column.
challenge_map = ggplot() + geom_sf(data = sfboundary, col = 'purple') + geom_sf(data = tracts_lonlat, alpha = 0.2) + #alpha = 0.2 for transparency, so we can see sfboundary geom_sf(data= sfhighways_lonlat, col = 'red') + geom_sf(data = sfhomes15_sf, aes(col = totvalue)) + geom_sf(data = landmarks_sf, col = 'red')
challenge_map
Excellent! Everything lines up!
ggmapNote: Now that we’ve transformed our landmarks’ coordinates to geographic coordinates, let’s see how to get geocoded coordinates from ggmap, to see how the results compare:
NOTE: ggmap::geocode prints a lot of output.
#prep our names for geocoding with the Google Maps API place_names = paste0(landmarks$name, ', San Francisco, CA') #geocode them geocode_coords = geocode(place_names)
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Coit+Tower,+San+Francisco,+CA&key=xxx
## "Coit Tower, San ..." not uniquely geocoded, using "coit tower, san francisco, ca 94133, usa"
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Twin+Peaks,+San+Francisco,+CA&key=xxx
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=City+Hall,+San+Francisco,+CA&key=xxx
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Golden+Gate+Park,+San+Francisco,+CA&key=xxx
ggmapLet’s look at the coordinates we transformed to lon/lat.
#transformed coords landmarks_lonlat
## Simple feature collection with 4 features and 2 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -122.4862 ymin: 37.75251 xmax: -122.4058 ymax: 37.80239 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## id name geometry ## 1 1 Coit Tower POINT (-122.4058 37.80239) ## 2 2 Twin Peaks POINT (-122.4476 37.75251) ## 3 3 City Hall POINT (-122.4193 37.77926) ## 4 4 Golden Gate Park POINT (-122.4862 37.76942)
ggmapHow do they compare to our geocoded coordinates?
#geocoded coords geocode_coords
## # A tibble: 4 x 2 ## lon lat ## <dbl> <dbl> ## 1 -122. 37.8 ## 2 -122. 37.8 ## 3 -122. 37.8 ## 4 -122. 37.8
ggmapWhy so little precision?
There are actually more significant digits. They’re just truncated by R’s display.
#geocoded coords cbind(geocode_coords$lon, geocode_coords$lat)
## [,1] [,2] ## [1,] -122.4058 37.80269 ## [2,] -122.4477 37.75441 ## [3,] -122.4193 37.77926 ## [4,] -122.4862 37.76942
ggmapNice! Extremely close.
(The ones that differ just come down to differences in where Google Maps places the pins for these landmarks versus where our dataset does.)
tmap packageHow do we manipulate more of the aspects in our plots?
How do we make our plots prettier?
What other options do we have for plotting sf objects?
For this, we’ll turn to tmap.
tmapWe’ve already mapped with base R’s plot, ggplot, and ggmap.
We’ve seen how to control some plot aesthetics, to make our maps data-rich.
We will continue to explore similar functionalities. But we’ll do so using the tmap package.
We won’t spend a ton of time reviewing tmap, because you’ll get more practice with it in Parts 2 and 3.
tmapAs we will see, tmap offers some crucial additional options:
sf and sp objects)tmaptmap stands for thematic map
It’s a powerful toolkit for creating maps with sf and sp objects, with less code than the alternatives
Syntax inspired by ggplot2 (but a bit simpler)
tmapFirst we’l load the library
library(tmap)
Note: You may need to install /load dependencies - ggplot2, RColorbrewer, classInt, leaflet libraries
tmap: quick tmapsWe can create quick tmaps with the qtm function:
qtm(sfhomes15_sf)
tmap: modestmap has 2 modes:
plot: makes static maps
view: makes interactive maps
Use the command tmap_mode to toggle between them:
tmap_mode("plot")
tmap_mode("view")
tmap: modesWe can create an interactive version of the previous map:
tmap_mode("view") # set tmap to interactive view mode
qtm(sfhomes15_sf) # Interactive - click on the points
tmap: modesNotice that this is a live, interactive map! Click to see Popups! Click on the Layer Selector!
tmap: modesConveniently, we can also toggle between the modes using the ttm function.
ttm()
## tmap mode set to plotting
ttm()
## tmap mode set to interactive viewing
ttm()
## tmap mode set to plotting
ttm()
## tmap mode set to interactive viewing
tmap: building custom mapsTo customize our tmaps, we need to use tmap’s more complex, ggplot-type syntax
tm_shape(tracts) + tm_polygons(col="beige", border.col="red", alpha=0.5)
tm_shape(<sf_object>) specifies the sf object
+ tm_<element>(...) specifies the symbology
plus other options for creating a publication ready map
tmap: mapping polygonsWe can add and customize polygons using tm_polygons.
tm_shape(tracts) + tm_polygons(col="beige", border.col="red", alpha = 0.4)
tmap: mapping polygonstm_shape(tracts) + tm_polygons(col="beige", border.col="red", alpha = 0.4)
#then we'll switch back to 'plot' mode ttm()
## tmap mode set to plotting
tmap: mapping pointsWe can also map and customize point data using tm_dots
tm_dots are a type of tm_symbols()
See ?tm_symbols for other types of point symbols.
tm_shape(sfhomes15_sf) + tm_dots(col="skyblue", size=.25)
tmap: mapping pointstm_shape(sfhomes15_sf) + tm_dots(col="skyblue", size=.25)
tmap: mapping linesAnd we can add and customize lines using tm_lines.
tm_shape(sfhighways_lonlat) + tm_lines(col="black")
tmap: mapping linestm_shape(sfhighways_lonlat) + tm_lines(col="black")
tmap: mapping aesthetics onto data valuesThis works the same as with ggplot, except for two important things to note:
aes aesthetic-mapping functiontm_shape(sfhomes15_sf) + tm_dots(col="totvalue", size=.25)
tmap: mapping aesthetics onto data valuestm_shape(sfhomes15_sf) + tm_dots(col="totvalue", size=.25)
tmap: mapping multiple layersWe can overlay multiple sf objects, or layers, by concatenating the tmap commands with plus signs (again, using ggplot-style syntax).
# Map the SF Boundary first overlay_map = tm_shape(sfboundary) + tm_polygons(col="beige", border.col="black") + # Overlay the highway lines next tm_shape(sfhighways_lonlat) + tm_lines(col="black") + # Then add the house points tm_shape(sfhomes15_sf) + tm_dots(col="totvalue", size=.25)
tmap: mapping multiple layersoverlay_map
tmap: mapping multiple layersWhat if we then want to add our landmarks? Does this code work?
overlay_map +
tm_shape(landmarks_lonlat) +
tm_dots(col = 'skyblue', size = 2)
tmap: mapping multiple layersYup!
tmap: CRS managementCan we redo it using landmarks_sf instead of landmarks_lonlat?
overlay_map +
tm_shape(landmarks_sf) +
tm_dots(col = 'skyblue', size = 2)
tmap: CRS managementYup! What does that tell us about tmap?
tmap: CRS managementtmap reprojects for us on the fly!
If the CRSs are defined, tmap will use that info - if not, tmap will assume WGS84 -
and dynamically reproject subsequent layers to match first one added to map.
tmap: advanced customizationWe can of course tweak all sorts of details in our maps.
tm_shape(sfboundary) +
tm_polygons(col="beige", border.col="black") +
tm_shape(sfhighways_lonlat) +
tm_lines(col="black") +
tm_shape(sfhomes15_sf) +
tm_dots(col="totvalue", size=.25,
title = "San Francisco Property Values (2015)") +
tm_layout(inner.margins=c(.05, .2, .15, .05))
# bottom, left, top, right
tmap: advanced customizationtmap: advanced customizationWe can also customize the popups!
Notice the changes relative to the previous code.
tmap_mode('view')
tm_shape(sfboundary) +
tm_polygons(col="beige", border.col="black") +
tm_shape(sfhighways_lonlat) +
tm_lines(col="black") +
tm_shape(sfhomes15_sf) +
tm_dots(col="totvalue", size=.05,
title = "San Francisco Property Values (2015)",
popup.vars=c("SalesYear","totvalue","NumBedrooms",
"NumBathrooms","AreaSquareFeet")) +
tm_layout(inner.margins=c(.05, .2, .15, .05)) # bottom, left, top, right
tmap: advanced customizationtmap: advanced customizationAnd we can access a wide variety of basemaps.
(Here’s a list.)
Let’s try watercolor!
tm_basemap("Stamen.Watercolor") +
tm_shape(sfhomes15_sf) +
tm_dots(col="totvalue", size=.05, title = "San Francisco Property Values (2015)") +
tm_tiles("Stamen.TonerLabels")
tmap: advanced customizationLet’s try watercolor!
tmap: advanced customizationAnd here’s another type of basemap.
tmap_mode('view')
tm_basemap('OpenStreetMap.HOT') +
tm_shape(sfhomes15_sf) +
tm_dots(col="totvalue", size=.05, title = "San Francisco Property Values (2015)") +
tm_tiles("Stamen.TonerLabels")
tmap: advanced customizationAnd here’s another type of basemap.
tmap: saving and sharing mapsWe can use the tmap_last function to grab the last map we made.
Here, we save it to a named variable (‘map1’), then display it.
fav_map <- tmap_last()
tmap: saving and sharing mapsThen we can display it.
fav_map
tmap: saving and sharing mapsAnd then we can use tmap_save to save it.
Let’s save in a couple formats, then look at them.
Note: We can specify the output format by just using the file extension!
tmap_save(fav_map, "./output/SF_properties.png", height=6) # Static image file with tmap_save(fav_map, "./output/SF_properties.html") # interactive web map
tmap: saving and sharing mapsThere are many ways to publish your maps.
You can share you map online by publishing it to RPubs.
Enter the name of your tmap object (map1) or your tmap code in the console
In the Viewer window, click on the Publish icon.
tmap: saving and sharing mapsHere’s an RPubs Demo.
tmap: starting points?tmap
?tmap_shape
?tmap_element
?tm_polygons (tm_fill, tm_borders)
?tm_symbols (tm_dots, etc…)
The Geocomputation with R textbook (Lovelace, Nowosad, and Muenchow, 2019).
Using tmap instead of ggplot, recreate the map from our previous challenge.
Here’s the prompt again:
Plot tracts, boundary, highways, homes, and landmarks together. Make the border purple, and the highways and the landmarks red. Color the homes by the ‘totvalue’ column.
And here’s the map:
challenge_map
challenge_map_2 = tm_shape(sfboundary) +
tm_polygons(border.col = 'purple', alpha = 0) +
tm_shape(tracts_lonlat) +
tm_polygons(col = 'lightgray', border.col = 'darkgray', alpha = 0.3) +
tm_shape(sfhighways_lonlat) +
tm_lines(col = 'red') +
tm_shape(sfhomes15_sf) +
tm_dots(col = 'totvalue', size = 0.05, palette = '-Blues') +
tm_shape(landmarks_sf) +
tm_dots(col = 'red', size = 0.1)
challenge_map_2## Recap
sf library - spatial objects and methods in RsftmapThat’s the end of Part I!
See you in Part II, where we’ll cover spatial analysis!